The effect of nonpharmaceutical interventions on COVID-19 infections for lower and middle-income countries: A debiased LASSO approach

This paper investigates the determinants of COVID-19 infection in the first 100 days of government actions. Using a debiased LASSO estimator, we explore how different measures of government nonpharmaceutical interventions affect new infections of COVID-19 for 37 lower and middle-income countries (LMCs). We find that closing schools, stay-at-home restrictions, and contact tracing reduce the growth of new infections, as do economic support to households and the number of health care workers. Notably, we find no significant effects of business closures. Finally, infections become higher in countries with greater income inequality, higher tourist inflows, poorly educated adults, and weak governance quality. We conclude that several policy interventions reduce infection rates for poorer countries. Further, economic and institutional factors are important; thereby justifying the use, and ultimately success, of economic support to households during the initial infection period.


Introduction
The initial spread of COVID-19 infections presented a series of policy challenges for governments and public health authorities-particularly over the composition and possible magnitude of non-pharmaceutical intervention for policymakers to consider. Of all possible options, which are likely to incur economic as well as political costs? Which ones are effective? This paper investigates the determinants of COVD-19 infection rates by looking specifically at the first 100 days of government actions for addressing the spread of COVID-19 infections. In fact, analyzing the first 100 days is important to understand how governments react to control the spread of the virus when the problem has not been heightened. The first 100 days of the coronavirus challenge can be known as a "golden period" for government response because the virus has not prevailed throughout the community. Indeed, the concept of evaluating public policy during "the first 100 days" is a common term among policymakers since the US President, Franklin Roosevelt took his office in 1933. The economy faced a hard depression characterized by the high unemployment rate and bank failures when he started working as the quality experience more infections. Despite using a sample pertaining to the recent infection rates for COVID-19, we feel that these results are generalizable and can be used to inform policy actions for other infectious diseases. We conclude that, for the most part, the policy interventions employed in poorer countries can reduce infection rates. Further, the economic and institutional environment is also important, thereby justifying the use, and ultimately success, of economic support to households during the initial infection period. The paper is structured as follows: Section 2 provides a brief literature survey and some historical context. Section 3 provides the empirical model and estimation procedure. Section 4 presents the data and the results. Section 5 concludes.

Brief literature review and some history
Numerous studies show how infectious diseases like 1918 influenza, SARS, MRSA, and Avian Flu make human disasters, create economic recessions, change demographical structure, and burden socio-economic costs on firms, households, and governments. However, only 1918 influenza is comparable with COVID-19 based on the spread contagion of the virus, the medical limitation, the effect on the respiratory system, and the similarity in government interventions via closing public events, locking down societies, and requiring wearing masks (see [17]). Further, [18] indicate that 1918 influenza is the worst-case scenario for COVID-19 outcomes. Therefore, lessons from the great influenza pandemic assist policymakers in controlling current human disaster worldwide. We review public health policy during both 1918 influenza and COVID-19 to understand how the government responses affect pandemics and the expected health outcome of these interventions.
It is shown in [19] how non-pharmaceutical interventions affect the weekly excess death rate during 1918 influenza in the U.S. They gather the weekly extra death rate for 43 cities data over 24 weeks and analyze the impact of three different interventions, namely school closure, cancellation of public gatherings, and isolation (quarantine), on the outcome of the pandemic. According to their findings, non-pharmaceutical interventions significantly mitigate the consequences of the pandemic in the United States. Additionally, they state that implementing early non-pharmaceutical health policy leads to delay in reaching peak mortality. In [20], the effect of public restrictions such as school closures and social distancing on the 1918 influenza pandemic is analysed. He uses city-level U.S. data and applies the difference-in-differences (DiD) method to evaluate the economic and health benefits of non-pharmaceutical interventions. His medium-run findings reveal that a significant share of people saved during the 1918 pandemic lost their lives during the upcoming years. He states that the long-run social distancing probably lowers the herd immunity in society. Further, he mentions that this public health policy reduces the death rate over the short run, especially when the death rate is at its peak. Moreover, [21] analyzes the impact of three measures of public health interventions, including school closure, prohibitions on public gatherings, and isolation on the excess death rate of 1918 influenza across 45 large U.S. cities. His findings confirm the negative and significant association between non-pharmaceutical interventions and the peak of the extra death rate. He mentions that more interventions flattened the curve for mortality during the 1918 pandemic. However, this effect is weak and insignificant on overall deaths. Barro concludes that government distancing measures to the pandemic probably delay deaths in the societies rather than removing them.
The epidemiological Susceptible Infected Recovered (SIR) model is developed in [22] to multi-age groups of people above 20 years old, including "young," "the middle-aged," and the "old". They show how different government interventions in testing and tracking and group distancing impact infections, deaths, and economic loss. Their specification allows them to evaluate the trade-off between saving lives and GDP due to implementing public health interventions. Their findings reveal that keeping the adult's death rate below 0.2 percent needs a strict economic lockdown for more than one year and a half. As a result of this intervention, the U.S. GDP shrinks almost 40 percent for one year. They argue that the government should perform a tough and long lockdown among the most vulnerable group to control infections by maximizing economic benefit. Further, less strict lockdown should be implemented for the young and middle-aged people as the low-risk groups. They conclude that decreasing the interactions between elders and other groups and increasing testing and isolating the infected ones leads to minimized economic losses and deaths.
It is shown in [23] how the spread of Coronavirus is determined via imposing economic and behavioral restrictions in New York City. He uses the fraction of COVID-19 tests yielding a positive result to measure the spread of the virus. The data covers 177 zip codes, relying daily on one. He intelligently defines new standards for business activity and stay-at-home indices by using smartphones information. The business activity index is constructed based on the number of people who visited businesses in each zip code. The stay at home index is defined by the fraction of smartphones (people) that stayed fixed at their home location during the pandemic. Further, [23] applies the fixed effects method over calendar date and the zip code of residence to capture unobserved heterogeneities over time and sections. His findings reveal that the number of visits to local businesses is positively correlated with the positivity rate of COVID-19 tests. However, the fixed location of smartphones lowers the likelihood of the positivity rate by 2 percentage points.
The effect of the business shut down on COVID-19 deaths in Italy is investigated in [4]. They gather a substantial dataset across 4,000 Italian municipalities, which covers 222 local labor markets. They define the business shutdown as the share of the workers without any essential activities-due to COVID-19-to the number of total employees. Further, some variables like the share of working-age females, the share of high school graduates, and the population density are used as other controls. Their findings reveal that business shutdown, especially in the retail trade and hospitality sectors, significantly reduce the COVID-19 death rate. Additionally, the results confirm that performing closure restrictions one week earlier could save 25 percent of the lost lives in Italy. Further, [24] analyze whether public health interventions are efficient in Europe over the first wave of the Coronavirus pandemic. The data covers 11 European countries, including Austria, Belgium, Denmark, France, Germany, Italy, Norway, Spain, Sweden, Switzerland, and the U.K. They use school closure, prohibitions of public events, social distancing, and lockdown decreed as measures of non-pharmaceutical intervention. Their findings demonstrate that economic lockdown has a large impact on reducing virus transmission in Europe.
Overall, we find that there are a number of papers that have focused on one, or a category of possible determinants of infections. In the sections that follow, we adopt a data driven approach that allows for the selection of the most influential variables that may determine COVID-19 infection rates among LMCs.

Model and methodology
We specify a reduced form panel data model with many covariates for the first 100 days of the spread of the Coronavirus as follows, where Y is the ratio of the new infections per 100,000, X = (X 1 , X 2 ,. . .,X n ) includes the vector of daily government interventions such as school and business closures, stay-at-home restriction, contact tracing, households' economic support, and tests per population. W = (W 1 ,. . .,W m ) is the vector of country-level socio-economic factors, including population density, average years of adults' education, income inequality, international arrivals per population, and health workers per population in i th country, j th countinent and t th period. Accordingly, Cou, Con and t indicate the country, continent and time fixed effects; there are several studies addressing how employing fixed/random effects for LASSO models leads to more reliable estimates due to capturing unobserved heterogeneities [25,26]. These fixed effects are utilized to flexibly account for omitted variables within regions and over time. The reason we employ the continent fixed effects is to capture regions' unobserved heterogeneity, though we do run regressions without regional fixed effects as part of testing for robustness. We normalize the variables to avoid problems with scaling. [27] suggest normalization of variables because the scale of variables affect regulation of the parameters. This transformation changes the distribution of variables to a normal one with zero mean and unit variance. It means if the model consists of some categorical variables, they will no longer be discontinuous. This normalization allows us to interpret their coefficients like continuous regressors rather than interpreting each category with regard to a baseline group, case, or condition. Further, due to the normalization of variables, the model does not include an intercept. Note that these fixed effects do not lead to an unidentifiability issue in the model because the time-invariant variables such as education, income inequality, etc., vary from country to country. Therefore, the time invariants and country or continent fixed effects do not overlap each other. The parameters β, η, θ, φ and λ are the vectors of slopes. ε is the error term.
To show the procedure of de-biased LASSO, we first simplify our representations and specify the Eq (1) as, Where Z and ψ reflect for a p-dimensional vector of explanatory variables and their corresponding coefficients, respectively. Based on Zhang and Zhang (2014), firstly, we need to decorrelate the vector of normalized explanatory variables, Z. For this purpose, we employ LASSO fit of an explanatory variable, Z j versus all other explanatory variables, Z −j . Here, the dimension of Z −j is p−1. Accordingly, we set the optimization problem as follows, Eq (3) satisfies the Karush-Kuhn-Tucker (KKT) optimality condition for the estimates. γ is a vector of coefficients corresponding to Z −j . z>0 is a tuning parameter that is determined by a cross-validation technique. k.k 1 and k.k 2 are ℓ 1 and ℓ 2 norms, respectively. ℓ 1 and ℓ 1 norms are known as "Absolute-value" and "Euclidean" norms, as well. Generally, for a vector of x = Now, we regress the vector of residuals defined by Eq (4) over the response variable in Eq (2) as follows,

PLOS ONE
Effect of nonpharmaceutical interventions on COVID-19 infections for lower and middle-income countries Eq (5) demonstrates M j is not orthogonal to Z j . This issue induces bias in c y j . The second part of the decomposition, P k6 ¼j M Tr j Z k M Tr j Z j c k refers to the bias arising from the existing correlation between the explanatory variables; it is worth noting here that the correlations among the explanatory variables in models with too many covariates lead to biased estimates using alternative regularization and variable selection methods, such as Ridge regression and elastic net [28,29]. Here, M Tr j Z k 6 ¼ 0 at least for one k6 ¼j. The last term, and ε are orthogonal to each other.
In the second step, we regress Z over Y trough employing the LASSO method. The KKT optimality condition is presented below, Now, we deduct the bias term obtained Eq (5) from each component of the estimated coefficients,ĉ initial . Therefore, the de-biased LASSO estimator is shown as follows, where the final bias term is calculated via inserting the coefficientĉ initial k obtained from Eq (6). We have a specific value for each coefficient after the debiasing procedure because the debiased coefficients now consist of two parts. Then, even theĉ initial is zero, it is likely the second part of Eq (7), P k6 ¼j will not be zero.
A framework to estimate regression variance for the debiased LASSO method is developed in [12].
jM Tr j Z j j is known as a "noise factor." It is shown in [13] that this condition is satisfied even for mis-specified models. Indeed, such inference allows researchers to calculate reliable confidence intervals and p-values for estimates.

Data, estimation, and results
Our dataset covers 37 LMCs across four geographic regions, including 11 African, 12 Asian, 8 Latin American and the Caribbean, and 6 European countries (See Appendix A in the S1 Appendix). The definitions and sources of the data, including measures of interventions and nonpharmaceutical instruments, government support, socio-economic conditions, and health care variables are all reported in Appendix B in S1 Appendix and a scatterplot is presented in Appendix F in S1 Appendix. We employ a low dimension debiased LASSO method (see [12]) since the number of explanatory variables-12 variables without considering the continent and country fixed effects-is less than the number of observations, 3700 = 37 countries x 100 days.
This paper considers governance quality as one of the major predictors of covid infection rate. The intuition for using such a variable is to understand how good or bad governance contributes to the control of the infection rate. As one of the limitations of this research, we did not access updated information for this variable. Accordingly, we create the governance quality index based on [30] by combining six governance dimensions via principal component analysis (PCA). In terms of policy analysis, this unique index is much more informative for policymaking because it reflects the rank of low and middle-income countries in governance quality. As such, countries can ascertain how their performance in governing societies contributes to combat with new diseases. Details of data and method are given in Appendix C in S1 Appendix. We also provide a table of descriptive statistics of all variables including the governance quality indicator, and a correlation matrix for the key variables in Appendix D in S1 Appendix.
We summarize the results in Table 1 where we estimate 9 different models. Model 1 is the baseline regression in which government nonpharmaceutical interventions explain the infection rate. Model 2 includes government supports to households. We evaluate the effects of other factors one by one, Model 3 to 8, to show the robustness of estimates when controls change. Model 9 presents the effects of all policy variables and socio-economic covariates.
Economic support to households -0.0029 Our finding strongly supports that the school closure and contact tracing significantly mitigate the infection rate in LMCs. Further, the results confirm that the testing and infection rates are positively correlated with each other. More interestingly, we find that the economic support package to households is negatively associated with the infection rate. However, and notably, there is no evidence of a significant correlation between business closures and infection rates-calling into question the efficacy of that policy action for LMCs.
Since variables are standardized, coefficients must be interpreted based on the changes in standard deviations (see [31]). In Model 1 a one standard deviation hike in the school closure index-keeping other variables fixed-leads to a 0.056 standard deviation decrease in the infection rate. Also, a positive change of one standard deviation in the contact tracing policy leads to a 0.135 standard deviation decrease in the infection rate. Further, the stay-at-home restriction reduces infections by a 0.108 standard deviation. Regarding testing policy, one standard deviation hike in the testing rate by keeping other variables fixed increases the infection rate by 0.4 percent standard deviation.
Model 2 presents the coefficients for government supports to households. The incidence of infection decreases by a 0.3 standard deviation when household economic support increases by one standard deviation while keeping all other variables unchanged. From Model 3, for the effect of health care workers, increasing one standard deviation in the number of nurses and doctors decreases the infection rate by a 0.045 standard deviation. This finding is consistent with the work of [5] and [32], which shows that the efficacy of the health care system can contribute to reducing cases.
Model 4 shows that average years of adults' schooling years have negatively correlated with the infection rate, as per [33], which examines the relationship between schooling and healthrelated behaviours; an "education gradient". Accordingly, we expect the virus to be less prevalent in more educated countries. If we hold other variables constant, a one standard deviation increase in adults' average schooling years results in a 0.027 standard deviations decrease in infection rates.
Model 5 shows the effect of income inequality on the spread of the virus. Here, the nexus between income inequality and infection rate is positive-consistent with [1] and [34]. People living in countries with extreme income inequality are not able to afford primary health care. Keeping other factors constant, the infection rate increases by a 0.014 standard deviation when income inequality increases by one standard deviation.
Model 6 reports the reaction of the infection rate to population density. The infection rate increases by 0.02 percent standard deviation when the density of the population changes by one standard deviation, consistent with [6]. The magnitude of the coefficient is not considerable, although it is statistically significant. This finding may be due to a significant proportion of the economy living in rural areas characterized by a low density of people.
Model 7 demonstrates the effect of tourism arrivals on the COVID-19 infection rate. The standardized coefficient of this variable reveals that one standard deviation increase in tourism arrivals hikes the infection rates by 0.126 standard deviations. A very similar result is found in [3]. The size of this coefficient is larger than other socio-economic covariates, suggesting that controlling tourism arrivals may have allowed governments to prevent the spread of the virus before implementing a national policy a border closures. It is worth pointing out here that border closures were not implemented until well into the first 100 days of COVID cases. As such, there was still a significant flow of tourists during that time. See https://www.bsg.ox.ac.uk/ research/research-projects/covid-19-government-response-tracker, and https://www.bbc. com/news/world-52103747.
Model 8 shows the importance of the quality of public institutions in controlling COVID-19 outbreaks. The estimated coefficient reveals that the infection rate falls by a 0.147 standard deviation when governance quality rises by one standard deviation, keeping other variables fixed. This result is consistent with [35] and [2], highlighting the role of good governance.
Model 9 provides broad estimates of coefficients when all policy variables and socio-economic controls are included in the model. The results for this regression are consistent with the above results.

Robustness checks
We refit the model with other alternative debiased LASSO methods such as scaled debiased LASSO and residual bootstrapping debiased LASSO methods to check the consistency of the estimates. Both of these approaches are derived from Eq (6) after a few adjustments. The former modifies the penalty function to estimate the regression's noise along with the slopes. The latter resamples the residuals obtained from Eq (6), then approximates the empirical distribution of the outcome variable. This provides more accurate confidence intervals for the estimates [36]. It is stated in [37] that using the bootstrapping technique leads to a precise selection procedure as well.
The Scaled debiased LASSO is constructed based on [38][39][40] to consider variance and coefficient in the optimization procedure. In light of this, the scaled LASSO technique jointly estimates the regression coefficients and noise level in a linear model as s 2 þ kYÀ Zck 2 2 ð2sNTÞ þ z 0 k c k 1 . According to this scale-invariance analysis, the penalty level is proportional to the noise level of the regression model. Recall, the standard LASSO method assumes that the optimal penalty parameter depends on the error scale, so it is mainly determined by cross-validation. Therefore, the Scaled LASSO offers the advantage of scale-free penalty parameters that are predetermined from purely theoretical considerations (see [41]). The Scaled LASSO provides the advantage of automatically adjusting the penalty level in a regression model for yielding optimal convergence, (see [42]). Although this approach uses an alternative optimization function, it needs to be debiased like the conventional LASSO model in [12]. It is shown in [41] that the Scaled LASSO method performs inappropriately when predictors are strongly correlated with each other. The results of the Scaled debiased LASSO are provided in Table 2 as follows, Also, for another robustness check, a residual bootstrapping method for the debiased LASSO regression is implemented. As given in [43], this process starts with estimating coefficients from the conventional LASSO method. Then, initial residuals are calculated througĥ ε ¼ Y À Zĉ. Accordingly, centered residuals are computed byε centered it ¼ε it À � ε for (i = 1,� � �, N) and (t = 1,� � �,T). Also, an expected value for residuals equals � ε ¼ NT À 1=2 P T t¼1 P N n¼1εit . Note that the expected value for residuals is not zero in the original LASSO because the type of penalty applied for regularizing the parameters differs from the OLS.
Then, the bootstrapped errors ðε � 11 ; � � � ; ε � NT Þ, are obtained from centered residuals. After that, the bootstrapped response variable is produced as follows, Here, Y � is non-random and the sample has a fixed design now. [43] show that the estimates obtained from the residual bootstrapping debiased LASSO method are asymptotically consistent. Also, [44] indicates that the residual bootstrapping debiased LASSO estimates are more efficient than other debiased methods. The estimates for the residual bootstrapping debiased LASSO method after 500 times replications are represented below (Table 3), As seen, all coefficients obtained from the residual bootstrapping debiased LASSO method are in line with the estimates obtained from the debiased LASSO method. The important point about the debiased LASSO methods here is the results are close to each other. However, the initial estimates from the conventional LASSO, scaled LASSO and bootstrapping LASSO are somehow different. Appendix E in S1 Appendix reports the initial estimates for model 9 based on these different approaches.
The findings show that the estimates of the conventional and bootstrapping LASSO models are similar to each other in terms of variable selection with slight differences in the magnitude of coefficients. The results with the other method, the Scaled LASSO, differ from the conventional and bootstrapping LASSO techniques. The results reveal that the Scaled LASSO selects more variables compared to the other two methods. Additionally, the magnitude of coefficients selected by the Scaled LASSO method is slightly larger than the other two methods. These differences are due to the different penalty functions employed for selecting covariates. Further, we check the robustness of the models by changing the sample size and implementing extreme bounds analysis (EBA). Based on [45], we first change the sample size through the leave-one-out method and re-estimate the models to ensure the estimates are robust and are not sensitive to the sample size. For example, we remove the first country, i.e., Argentina, from our sample and re-estimate the coefficients throughout Model 1 to 9. Then, we proceed estimating procedure by removing the second country and replacing the first Þ are the lower and upper extreme bounds, respectively. This approach outlined here is similar to [46] because the extreme bounds are constructed based on the average values of coefficients obtained from different design matrices. He considers different combinations of variables and attains the average values for each coefficient. Then, establish the lower and upper extreme bounds for estimates. Likewise, we calculate these bounds relying on the average value of coefficients by altering the sample size. In this sense, a coefficient is fragile when the lower outer bound is negative while the upper extreme bound is positive. Table 4 reports the average coefficient values and the corresponding lower and upper bounds.
According to the magnitudes and signs of the coefficients, the values in Table 2 are broadly similar to those in Table 1; all estimates related to the government interventions except the business closure are robust. This is consistent over different models. It has also been demonstrated that other socio-economic factors associated with the spread of the virus are robust across all models, as none of the structured credible intervals contain a zero value.

Conclusions
We examine the effects of several government nonpharmaceutical interventions, as well as the socio-economic environment on the spread of the virus in LMCs. We take a data-driven and multivariate approach in order to identify the important factors determining infection rates.
In order to deal with a dataset with too many predictors involving correlation between covariates, we apply a debiased LASSO method along with several robustness exercises. Our findings suggest that school closure and contact tracing are the most effective interventions compared to other government responses to the spread of COVID-19. Curiously, we found that a policy involving business closures to be statistically insignificant in affecting infection rates. Further, our results reveal that economic support to households and the number of healthcare workers negatively affect the spread of the virus.
Finally, the density of population, income inequality, and tourism arrivals contribute to infections in those countries. In contrast, average years of adults' education and governance quality impact the infection rate negatively.
Closing schools and universities, limiting people to stay at home, and tracing the contacts of infected individuals are all effective policy interventions. Further, strengthening public institutions and increasing the number of health care workers are vital to assisting these countries in overcoming this health crisis.
While this paper has employed a sample pertaining to infection rates for the recent COVID-19 pandemic, we feel that these results can be generalized to inform policy actions for other infectious diseases. We conclude that, for the most part, there are a range of policy interventions can reduce infection rates for poorer countries. Further, socio-economic factors, the economic and institutional environment are also important in impacting the spread of infections. We feel that, as a consequence, this justifies the use, and ultimately the success-as shown in our analysis, of economic support to households during the initial infection period.